Type Prediction in Noisy RDF Knowledge Bases Using Hierarchical Multilabel Classification with Graph and Latent Features
نویسندگان
چکیده
Semantic Web knowledge bases, in particular large cross-domain data, are often noisy, incorrect, and incomplete with respect to type information. This incompleteness can be reduced, as previous work shows, with automatic type prediction methods. Most knowledge bases contain an ontology defining a type hierarchy, and, in general, entities are allowed to have multiple types (classes of an instance assigned with the rdf:type relation). In this paper, we exploit these characteristics and formulate the type prediction problem as hierarchical multi classification, where the labels are types. We evaluate different sets of features, including entity embeddings, which can be extracted from the knowledge graph exclusively. We propose SLCN, a modification of the local classifier per node approach, which performs feature selection, instance sampling, and class balancing for each local classifier with the objective of improving scalability. Furthermore, we explore different variants of creating features for the classifier, including both graph and latent features. We compare the performance of our proposed method with the stateof-the-art type prediction approach and popular hierarchical multilabel classifiers, and report on experiments with large-scale cross-domain RDF datasets.
منابع مشابه
A hierarchical multi-label classification ant colony algorithm for protein function prediction
This paper proposes a novel Ant Colony Optimisation algorithm (ACO) tailored for the hierarchical multilabel classification problem of protein function prediction. This problem is a very active research field, given the large increase in the number of uncharacterised proteins available for analysis and the importance of determining their functions in order to improve the current biological know...
متن کاملHierarchical Multilabel Protein Function Prediction Using Local Neural Networks
Protein function predictions are usually treated as classification problems where each function is regarded as a class label. However, different from conventional classification problems, they have some specificities that make the classification task more complex. First, the problem classes (protein functions) are usually hierarchically structured, with superclasses and subclasses. Second, prot...
متن کاملNeuro-symbolic representation learning on biological knowledge graphs
Motivation Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We deve...
متن کاملClassification of encrypted traffic for applications based on statistical features
Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...
متن کاملMultilabel Classification Evaluation using Ontology Information
Multilabel classification using ontology information is an emerging research area that combines machine learning methods with knowledge models. The performance assessment of such classification systems poses new challenges. We propose an evaluation measure that considers the mapping of label sets to their groundtruth and allows for the incorporation of real world knowledge. A distance-based mea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- International Journal on Artificial Intelligence Tools
دوره 26 شماره
صفحات -
تاریخ انتشار 2017